Pareto distribution (pareto) — heavy-tailed power laws#

The Pareto (Type I) distribution is a continuous distribution on \((x_m, \infty)\) with a power-law tail. It is the canonical model behind the Pareto principle (“80/20 rule”) and appears whenever a small number of observations dominate totals.

What you’ll learn#

  • how the PDF/CDF encode a power-law tail and a simple quantile function

  • which moments exist (and why some are infinite)

  • mean/variance/skewness/kurtosis, entropy, and what happens to the MGF

  • a clean NumPy-only sampler (inverse transform) and Monte Carlo validation

  • maximum likelihood estimation (MLE) and how it relates to exponentials on the log scale

  • practical usage via scipy.stats.pareto (pdf, cdf, rvs, fit)

import numpy as np

import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

import scipy
from scipy import stats
from scipy.integrate import quad

# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

# Reproducibility
rng = np.random.default_rng(7)
np.set_printoptions(precision=6, suppress=True)

print("numpy:", np.__version__)
print("scipy:", scipy.__version__)
numpy: 1.26.2
scipy: 1.15.0

1) Title & Classification#

  • Name: pareto (Pareto Type I; SciPy: scipy.stats.pareto)

  • Type: Continuous

  • Support: \(x \in [x_m, \infty)\)

  • Parameter space:

    • shape (tail index) \(\alpha > 0\)

    • scale / minimum \(x_m > 0\)

We write:

\[X \sim \mathrm{Pareto}(\alpha, x_m).\]

SciPy parameterization (mapping)

SciPy’s stats.pareto(b, loc=0, scale=1) uses a shape parameter b and then applies an affine transformation. With loc=0 and scale=x_m, SciPy’s b corresponds to our \(\alpha\).

2) Intuition & Motivation#

2.1 What it models#

The Pareto distribution models positive quantities with a hard lower bound and a heavy right tail.

A key feature is its scale-free tail:

\[\Pr(X > x) = \left(\frac{x_m}{x}\right)^{\alpha},\qquad x\ge x_m.\]

On a log–log plot, this tail is a straight line with slope \(-\alpha\).

2.2 Typical real-world use cases#

  • Wealth / income above a minimum threshold (upper tail behavior)

  • Insurance claims and large losses (catastrophic tail)

  • City sizes and firm sizes (upper tails)

  • File sizes / network traffic (bursty, heavy-tailed phenomena)

  • Waiting times in systems with rare extreme delays

2.3 Relations to other distributions#

  • Power law / “scale-free”: Pareto Type I is the canonical continuous power-law model.

  • Log transform to exponential: if \(X\sim\mathrm{Pareto}(\alpha,x_m)\) then $\(\log\left(\frac{X}{x_m}\right) \sim \mathrm{Exp}(\text{rate}=\alpha).\)$ This is extremely useful for inference.

  • Lomax (Pareto II): if \(X\sim\mathrm{Pareto}(\alpha,x_m)\) then \(X-x_m\) follows a Lomax distribution.

  • Generalized Pareto (GPD): Pareto is a special/heavy-tailed case closely related to extreme-value modeling.

3) Formal Definition#

Let \(\alpha>0\) and \(x_m>0\). The PDF is

\[ f(x\mid\alpha,x_m) = \alpha\,x_m^{\alpha}\,x^{-(\alpha+1)}\,\mathbf{1}\{x\ge x_m\}. \]

The CDF is

\[\begin{split} F(x\mid\alpha,x_m)= \begin{cases} 0, & x < x_m,\\ 1 - \left(\frac{x_m}{x}\right)^{\alpha}, & x\ge x_m. \end{cases} \end{split}\]

The survival function (tail probability) is

\[\bar F(x) = 1-F(x) = \left(\frac{x_m}{x}\right)^{\alpha},\qquad x\ge x_m.\]

The quantile function (inverse CDF) for \(0<u<1\) is

\[Q(u) = x_m (1-u)^{-1/\alpha}.\]
def pareto_pdf(x: np.ndarray, alpha: float, xm: float) -> np.ndarray:
    '''Pareto(Type I) PDF. Returns 0 for x < xm.'''
    x = np.asarray(x, dtype=float)
    alpha = float(alpha)
    xm = float(xm)

    if alpha <= 0 or xm <= 0:
        raise ValueError("Pareto parameters must satisfy alpha>0 and xm>0")

    out = np.zeros_like(x, dtype=float)
    mask = x >= xm
    # Use log space for stability
    log_pdf = np.log(alpha) + alpha * np.log(xm) - (alpha + 1) * np.log(x[mask])
    out[mask] = np.exp(log_pdf)
    return out


def pareto_sf(x: np.ndarray, alpha: float, xm: float) -> np.ndarray:
    '''Survival function P(X>x).'''
    x = np.asarray(x, dtype=float)
    alpha = float(alpha)
    xm = float(xm)

    if alpha <= 0 or xm <= 0:
        raise ValueError("Pareto parameters must satisfy alpha>0 and xm>0")

    out = np.ones_like(x, dtype=float)
    mask = x >= xm
    z = alpha * (np.log(xm) - np.log(x[mask]))  # <= 0
    out[mask] = np.exp(z)
    out[~mask] = 1.0
    return out


def pareto_cdf(x: np.ndarray, alpha: float, xm: float) -> np.ndarray:
    '''Pareto(Type I) CDF.'''
    x = np.asarray(x, dtype=float)
    alpha = float(alpha)
    xm = float(xm)

    if alpha <= 0 or xm <= 0:
        raise ValueError("Pareto parameters must satisfy alpha>0 and xm>0")

    out = np.zeros_like(x, dtype=float)
    mask = x >= xm
    z = alpha * (np.log(xm) - np.log(x[mask]))  # <= 0
    out[mask] = -np.expm1(z)  # 1 - exp(z), stable when z ~ 0
    return out


def pareto_ppf(u: np.ndarray, alpha: float, xm: float) -> np.ndarray:
    '''Quantile function (inverse CDF) for 0<u<1.'''
    u = np.asarray(u, dtype=float)
    alpha = float(alpha)
    xm = float(xm)

    if alpha <= 0 or xm <= 0:
        raise ValueError("Pareto parameters must satisfy alpha>0 and xm>0")
    if np.any((u <= 0) | (u >= 1)):
        raise ValueError("u must lie strictly in (0,1)")

    # Q(u) = xm * (1-u)^(-1/alpha)
    return xm * np.exp(-np.log1p(-u) / alpha)


def pareto_rvs(alpha: float, xm: float, size: int | tuple[int, ...], rng: np.random.Generator) -> np.ndarray:
    '''NumPy-only Pareto sampler via inverse transform.'''
    u = rng.random(size)
    # u in [0,1); use 1-u in (0,1] so u==0 maps safely to xm.
    return xm * np.exp(-np.log1p(-u) / float(alpha))


# Quick sanity check: PDF integrates to 1
alpha0, xm0 = 2.5, 1.0
area, err = quad(lambda t: alpha0 * xm0**alpha0 / (t ** (alpha0 + 1)), xm0, np.inf)
area, err
(0.9999999999999999, 1.1102230246251563e-15)

4) Moments & Properties#

4.1 Raw moments (and when they exist)#

For \(k<\alpha\) the \(k\)-th raw moment exists and is

\[\mathbb{E}[X^k] = \frac{\alpha\,x_m^k}{\alpha-k},\qquad (k<\alpha).\]

If \(k\ge \alpha\), the integral diverges and \(\mathbb{E}[X^k]=\infty\).

4.2 Mean, variance, skewness, kurtosis#

For \(X\sim\mathrm{Pareto}(\alpha,x_m)\):

  • Mean (exists for \(\alpha>1\)) $\(\mathbb{E}[X] = \frac{\alpha\,x_m}{\alpha-1}.\)$

  • Variance (exists for \(\alpha>2\)) $\(\mathrm{Var}(X) = \frac{\alpha\,x_m^2}{(\alpha-1)^2(\alpha-2)}.\)$

  • Skewness (exists for \(\alpha>3\)) $\(\gamma_1 = \frac{2(\alpha+1)}{\alpha-3}\sqrt{\frac{\alpha-2}{\alpha}}.\)$

  • Excess kurtosis (exists for \(\alpha>4\)) $\(\gamma_2 = \frac{6(\alpha^3+\alpha^2-6\alpha-2)}{\alpha(\alpha-3)(\alpha-4)}.\)$

Additional useful facts:

  • Mode: \(x_m\)

  • Median: \(x_m\,2^{1/\alpha}\)

4.3 MGF / characteristic function#

  • The MGF \(M(t)=\mathbb{E}[e^{tX}]\) diverges for all \(t>0\) (the tail is polynomial).

  • For \(t<0\), the Laplace transform exists and can be expressed using the upper incomplete gamma function \(\Gamma(s,z)\):

    \[M(t) = \alpha(-t x_m)^{\alpha}\,\Gamma(-\alpha, -t x_m),\qquad t<0.\]
  • The characteristic function exists for all real \(t\) and admits a similar expression:

    \[\varphi(t) = \alpha(-i t x_m)^{\alpha}\,\Gamma(-\alpha, -i t x_m).\]

In practice, these are often evaluated numerically.

4.4 Entropy#

The differential entropy is

\[H(X) = \log\left(\frac{x_m}{\alpha}\right) + 1 + \frac{1}{\alpha}.\]
def pareto_moments(alpha: float, xm: float) -> dict:
    a = float(alpha)
    xm = float(xm)
    if a <= 0 or xm <= 0:
        raise ValueError("Pareto parameters must satisfy alpha>0 and xm>0")

    mean = np.inf if a <= 1 else a * xm / (a - 1)
    var = np.inf if a <= 2 else a * xm**2 / ((a - 1) ** 2 * (a - 2))
    skew = np.nan if a <= 3 else (2 * (a + 1) / (a - 3)) * np.sqrt((a - 2) / a)
    excess_kurt = (
        np.nan
        if a <= 4
        else 6 * (a**3 + a**2 - 6 * a - 2) / (a * (a - 3) * (a - 4))
    )

    entropy = np.log(xm / a) + 1 + 1 / a
    median = xm * 2 ** (1 / a)
    mode = xm

    return {
        "mean": mean,
        "var": var,
        "skew": skew,
        "excess_kurtosis": excess_kurt,
        "entropy": entropy,
        "median": median,
        "mode": mode,
    }


alpha0, xm0 = 2.5, 1.0
theo = pareto_moments(alpha0, xm0)

dist = stats.pareto(b=alpha0, loc=0, scale=xm0)
m, v, s, k = dist.stats(moments="mvsk")

theo, {
    "scipy_mean": float(m),
    "scipy_var": float(v),
    "scipy_skew": float(s),
    "scipy_excess_kurt": float(k),
    "scipy_entropy": float(dist.entropy()),
}
({'mean': 1.6666666666666667,
  'var': 2.2222222222222223,
  'skew': nan,
  'excess_kurtosis': nan,
  'entropy': 0.483709268125845,
  'median': 1.3195079107728942,
  'mode': 1.0},
 {'scipy_mean': 1.6666666666666667,
  'scipy_var': 2.2222222222222223,
  'scipy_skew': nan,
  'scipy_excess_kurt': nan,
  'scipy_entropy': 0.4837092681258448})
# Moment existence in action: running averages can be unstable when moments are infinite.
def running_mean(x: np.ndarray) -> np.ndarray:
    return np.cumsum(x) / np.arange(1, len(x) + 1)


xm_demo = 1.0
alphas_demo = [0.8, 1.5, 3.0]
n_demo = 30_000

fig = go.Figure()

for a in alphas_demo:
    x = pareto_rvs(alpha=a, xm=xm_demo, size=n_demo, rng=rng)
    rm = running_mean(x)
    fig.add_trace(
        go.Scatter(
            x=np.arange(1, n_demo + 1),
            y=rm,
            mode="lines",
            name=f"alpha={a}",
        )
    )

fig.update_layout(
    title="Running sample mean for different tail indices (same xm=1)",
    xaxis_title="n",
    yaxis_title="running mean",
    yaxis_type="log",
    width=950,
    height=450,
)
fig

5) Parameter Interpretation#

5.1 Shape \(\alpha\) (tail index)#

The survival function is

\[\Pr(X>x) = \left(\frac{x_m}{x}\right)^{\alpha}.\]

So \(\alpha\) directly controls the tail heaviness:

  • smaller \(\alpha\) → heavier tail → extreme values are more common

  • larger \(\alpha\) → lighter tail → extremes decay faster

Moment existence is controlled by \(\alpha\):

  • mean exists iff \(\alpha>1\)

  • variance exists iff \(\alpha>2\)

  • skewness exists iff \(\alpha>3\)

  • excess kurtosis exists iff \(\alpha>4\)

5.2 Scale/minimum \(x_m\)#

The parameter \(x_m\) is a hard lower bound. Increasing \(x_m\) scales the distribution to the right.

If \(X\sim\mathrm{Pareto}(\alpha, x_m)\) then \(X/x_m\sim\mathrm{Pareto}(\alpha, 1)\).

# Shape changes: PDF curves for different alpha
xm = 1.0
x = np.geomspace(xm, 80 * xm, 600)

alphas = [0.8, 1.2, 2.0, 5.0]

fig = go.Figure()
for a in alphas:
    fig.add_trace(go.Scatter(x=x, y=pareto_pdf(x, a, xm), mode="lines", name=f"alpha={a}"))

fig.update_layout(
    title="Pareto PDF for different alpha (xm=1)",
    xaxis_title="x",
    yaxis_title="density",
    xaxis_type="log",
    yaxis_type="log",
    width=950,
    height=450,
)
fig
# Tail straight line on log-log scale: survival function
xm = 1.0
x = np.geomspace(xm, 500 * xm, 600)

alphas = [1.2, 2.0, 4.0]

fig = go.Figure()
for a in alphas:
    fig.add_trace(go.Scatter(x=x, y=pareto_sf(x, a, xm), mode="lines", name=f"alpha={a}"))

fig.update_layout(
    title="Pareto survival function P(X>x): log-log straight lines",
    xaxis_title="x",
    yaxis_title="P(X>x)",
    xaxis_type="log",
    yaxis_type="log",
    width=950,
    height=450,
)
fig

6) Derivations#

6.1 Expectation (general \(k\)-th moment)#

For \(k\in\mathbb{R}\):

\[ \mathbb{E}[X^k] = \int_{x_m}^{\infty} x^k\,\alpha x_m^{\alpha}x^{-(\alpha+1)}\,dx = \alpha x_m^{\alpha}\int_{x_m}^{\infty} x^{k-\alpha-1}\,dx. \]

The integral converges iff \(k-\alpha-1 < -1\), i.e. iff \(k<\alpha\). When it converges:

\[ \alpha x_m^{\alpha}\int_{x_m}^{\infty} x^{k-\alpha-1}\,dx = \alpha x_m^{\alpha}\left[\frac{x^{k-\alpha}}{k-\alpha}\right]_{x_m}^{\infty} = \frac{\alpha x_m^k}{\alpha-k}. \]

Plugging in \(k=1\) gives the mean (requires \(\alpha>1\)).

6.2 Variance#

For \(\alpha>2\) we have

\[\mathrm{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2.\]

Using \(\mathbb{E}[X^2] = \frac{\alpha x_m^2}{\alpha-2}\) and \(\mathbb{E}[X]=\frac{\alpha x_m}{\alpha-1}\) yields

\[\mathrm{Var}(X) = \frac{\alpha x_m^2}{(\alpha-1)^2(\alpha-2)}.\]

6.3 Likelihood and MLE#

For i.i.d. data \(x_1,\dots,x_n\) with \(x_i\ge x_m\):

\[ \mathcal{L}(\alpha,x_m) = \prod_{i=1}^n \alpha x_m^{\alpha} x_i^{-(\alpha+1)}. \]

The log-likelihood is

\[ \ell(\alpha,x_m)= n\log\alpha + n\alpha\log x_m - (\alpha+1)\sum_{i=1}^n \log x_i, \]

with the constraint \(x_m\le \min_i x_i\).

If \(x_m\) is known, differentiate w.r.t. \(\alpha\) and set to zero:

\[ \frac{\partial \ell}{\partial \alpha} = \frac{n}{\alpha} + n\log x_m - \sum_{i=1}^n \log x_i = 0 \quad\Rightarrow\quad \hat{\alpha} = \frac{n}{\sum_{i=1}^n \log(x_i/x_m)}. \]

If \(x_m\) is unknown, the likelihood increases with \(x_m\) (for fixed \(\alpha\)) subject to \(x_m\le \min_i x_i\), so

\[\hat{x}_m = \min_i x_i,\]

and then plug into the formula for \(\hat{\alpha}\).

Log-scale trick

Define \(Y_i = \log(x_i/x_m)\). Under the Pareto model, \(Y_i\sim\mathrm{Exp}(\text{rate}=\alpha)\). Inference for \(\alpha\) can be done using exponential-family tools.

def pareto_mle_alpha(x: np.ndarray, xm: float) -> float:
    x = np.asarray(x, dtype=float)
    xm = float(xm)
    if xm <= 0:
        raise ValueError("xm must be > 0")
    if np.any(x < xm):
        raise ValueError("All observations must satisfy x >= xm")
    s = np.sum(np.log(x / xm))
    if s <= 0:
        raise ValueError("Sum log(x/xm) must be positive")
    return len(x) / s


def pareto_mle(x: np.ndarray) -> tuple[float, float]:
    x = np.asarray(x, dtype=float)
    xm_hat = float(np.min(x))
    alpha_hat = pareto_mle_alpha(x, xm_hat)
    return alpha_hat, xm_hat


# Demonstrate MLE on synthetic data
alpha_true, xm_true = 2.5, 1.0
n = 4000
x = pareto_rvs(alpha=alpha_true, xm=xm_true, size=n, rng=rng)

alpha_hat, xm_hat = pareto_mle(x)
scipy_b_hat, scipy_loc_hat, scipy_scale_hat = stats.pareto.fit(x, floc=0)

{
    "true": (alpha_true, xm_true),
    "mle": (alpha_hat, xm_hat),
    "scipy_fit_floc0": (float(scipy_b_hat), float(scipy_scale_hat)),
}
{'true': (2.5, 1.0),
 'mle': (2.523314680630405, 1.000025688150666),
 'scipy_fit_floc0': (2.523314680630405, 1.000025688150666)}
# Likelihood shape in alpha (xm fixed at its MLE)
def pareto_loglik_alpha(alpha: float, x: np.ndarray, xm: float) -> float:
    a = float(alpha)
    if a <= 0:
        return -np.inf
    if np.any(x < xm):
        return -np.inf
    n = len(x)
    return n * np.log(a) + n * a * np.log(xm) - (a + 1) * np.sum(np.log(x))


grid = np.linspace(0.2, 8.0, 400)
ll = np.array([pareto_loglik_alpha(a, x, xm_hat) for a in grid])

fig = px.line(x=grid, y=ll, title="Log-likelihood as a function of alpha (xm fixed)")
fig.add_vline(alpha_hat, line_dash="dash", line_color="black")
fig.update_layout(xaxis_title="alpha", yaxis_title="log-likelihood", width=950, height=420)
fig

7) Sampling & Simulation#

7.1 Inverse transform sampling (NumPy-only)#

From the CDF \(F(x)=1-(x_m/x)^{\alpha}\) for \(x\ge x_m\):

\[U = F(X) = 1 - \left(\frac{x_m}{X}\right)^{\alpha}\]

Solve for \(X\):

\[X = x_m(1-U)^{-1/\alpha},\qquad U\sim\mathrm{Unif}(0,1).\]

Algorithm

  1. Draw \(U\sim\mathrm{Unif}(0,1)\).

  2. Return \(X = x_m(1-U)^{-1/\alpha}\).

7.2 Exponential connection (also NumPy-only)#

Since \(\log(X/x_m)\sim\mathrm{Exp}(\text{rate}=\alpha)\), you can also sample via

\[X = x_m\exp(E/\alpha),\qquad E\sim\mathrm{Exp}(1).\]
def pareto_rvs_via_exponential(alpha: float, xm: float, size: int | tuple[int, ...], rng: np.random.Generator) -> np.ndarray:
    e = rng.exponential(scale=1.0, size=size)
    return float(xm) * np.exp(e / float(alpha))


# Validate that both samplers agree (Monte Carlo)
alpha0, xm0 = 2.0, 1.0
n = 100_000

x1 = pareto_rvs(alpha0, xm0, size=n, rng=rng)
x2 = pareto_rvs_via_exponential(alpha0, xm0, size=n, rng=rng)

# Compare a few quantiles
qs = [0.5, 0.9, 0.99]
q1 = np.quantile(x1, qs)
q2 = np.quantile(x2, qs)
qt = pareto_ppf(np.array(qs), alpha0, xm0)

qs, q1, q2, qt
([0.5, 0.9, 0.99],
 array([1.415793, 3.132384, 9.755368]),
 array([1.413034, 3.147542, 9.814283]),
 array([ 1.414214,  3.162278, 10.      ]))
# The log-scale exponential relationship
alpha0, xm0 = 3.0, 1.0
x = pareto_rvs(alpha0, xm0, size=50_000, rng=rng)
y = np.log(x / xm0)

grid = np.linspace(0, np.quantile(y, 0.995), 500)
exp_pdf = alpha0 * np.exp(-alpha0 * grid)  # Exp(rate=alpha0)

fig = go.Figure()
fig.add_trace(go.Histogram(x=y, nbinsx=80, histnorm="probability density", name="log(X/xm)", opacity=0.6))
fig.add_trace(go.Scatter(x=grid, y=exp_pdf, mode="lines", name="Exp(rate=alpha)", line=dict(width=3)))

fig.update_layout(
    title="If X ~ Pareto(alpha,xm), then log(X/xm) ~ Exponential(rate=alpha)",
    xaxis_title="y = log(X/xm)",
    yaxis_title="density",
    width=950,
    height=420,
)
fig

8) Visualization#

We’ll visualize:

  • PDF (log–log to emphasize tail)

  • CDF

  • Monte Carlo samples vs theory

alpha, xm = 2.5, 1.0
x = np.geomspace(xm, 200 * xm, 700)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=pareto_pdf(x, alpha, xm), mode="lines", name="PDF"))
fig.update_layout(
    title=f"Pareto PDF (alpha={alpha}, xm={xm})",
    xaxis_title="x",
    yaxis_title="density",
    xaxis_type="log",
    yaxis_type="log",
    width=950,
    height=420,
)
fig
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=pareto_cdf(x, alpha, xm), mode="lines", name="CDF"))
fig.update_layout(
    title=f"Pareto CDF (alpha={alpha}, xm={xm})",
    xaxis_title="x",
    yaxis_title="F(x)",
    xaxis_type="log",
    width=950,
    height=420,
)
fig
# Monte Carlo: histogram + theoretical PDF overlay
n = 80_000
xs = pareto_rvs(alpha, xm, size=n, rng=rng)

x_max = np.quantile(xs, 0.995)
x_grid = np.geomspace(xm, x_max, 600)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=xs,
        nbinsx=120,
        histnorm="probability density",
        name="samples",
        opacity=0.6,
    )
)
fig.add_trace(go.Scatter(x=x_grid, y=pareto_pdf(x_grid, alpha, xm), mode="lines", name="theory", line=dict(width=3)))

fig.update_layout(
    title=f"Samples vs theory (alpha={alpha}, xm={xm})",
    xaxis_title="x",
    yaxis_title="density",
    xaxis_type="log",
    yaxis_type="log",
    width=950,
    height=450,
)
fig

9) SciPy Integration#

SciPy’s Pareto is used as:

dist = scipy.stats.pareto(b=alpha, loc=0, scale=xm)

With loc=0, this matches the textbook Pareto(Type I) on \([x_m,\infty)\).

alpha, xm = 2.5, 1.0
dist = stats.pareto(b=alpha, loc=0, scale=xm)

x = np.geomspace(xm, 50 * xm, 300)

# pdf / cdf
pdf_vals = dist.pdf(x)
cdf_vals = dist.cdf(x)

# sampling
xs = dist.rvs(size=5, random_state=rng)

pdf_vals[:3], cdf_vals[:3], xs
(array([2.5     , 2.388099, 2.281208]),
 array([0.      , 0.03218 , 0.063325]),
 array([1.535648, 1.472583, 1.287302, 1.250062, 1.510556]))
# Fitting
alpha_true, xm_true = 2.0, 1.5
x = stats.pareto(b=alpha_true, loc=0, scale=xm_true).rvs(size=5000, random_state=rng)

# If you allow loc to float, it may absorb part of the minimum; usually fix loc=0.
b_hat, loc_hat, scale_hat = stats.pareto.fit(x)  # free loc and scale
b_hat0, loc_hat0, scale_hat0 = stats.pareto.fit(x, floc=0)  # force loc=0

# Compare to MLE (xm_hat=min(x))
alpha_mle, xm_mle = pareto_mle(x)

{
    "true": (alpha_true, xm_true),
    "fit_free_loc": (float(b_hat), float(loc_hat), float(scale_hat)),
    "fit_floc0": (float(b_hat0), float(loc_hat0), float(scale_hat0)),
    "mle": (alpha_mle, xm_mle),
}
{'true': (2.0, 1.5),
 'fit_free_loc': (1.9473724039537028, 0.02377036106946595, 1.476315778565303),
 'fit_floc0': (1.9680604915855544, 0.0, 1.5000861396347691),
 'mle': (1.9680604915855544, 1.5000861396347691)}

10) Statistical Use Cases#

10.1 Hypothesis testing / confidence intervals (tail index)#

With known \(x_m\), define \(Y_i = \log(x_i/x_m)\). Under the Pareto model, \(Y_i\sim\mathrm{Exp}(\text{rate}=\alpha)\).

Let \(S = \sum_i Y_i\). Then \(S\sim\mathrm{Gamma}(\text{shape}=n,\text{rate}=\alpha)\), and

\[2\alpha S \sim \chi^2_{2n}.\]

This gives an exact confidence interval for \(\alpha\) by inverting chi-square quantiles.

10.2 Bayesian modeling#

Again with \(Y_i=\log(x_i/x_m)\), the likelihood is exponential with rate \(\alpha\).

A conjugate prior is Gamma:

\[\alpha \sim \mathrm{Gamma}(a_0, b_0)\quad\text{(shape–rate)}.\]

Posterior is

\[\alpha\mid\text{data} \sim \mathrm{Gamma}(a_0+n,\; b_0 + \sum_i Y_i).\]

10.3 Generative modeling (Pareto principle)#

For \(\alpha>1\), the Pareto distribution implies a Lorenz curve

\[L(p) = 1 - (1-p)^{(\alpha-1)/\alpha}.\]

The share of total mass held by the top fraction \(q\) is

\[\text{top-share}(q) = q^{(\alpha-1)/\alpha}.\]

This connects \(\alpha\) directly to “80/20”-style statements.

from scipy.stats import chi2


# Hypothesis test + exact CI for alpha (assuming xm known)
alpha_true, xm = 2.2, 1.0
n = 800

x = stats.pareto(b=alpha_true, loc=0, scale=xm).rvs(size=n, random_state=rng)
y = np.log(x / xm)
S = float(np.sum(y))

alpha_hat = n / S

# 95% exact CI via chi-square inversion
delta = 0.05
q_low = chi2.ppf(delta / 2, df=2 * n)
q_high = chi2.ppf(1 - delta / 2, df=2 * n)
alpha_low = q_low / (2 * S)
alpha_high = q_high / (2 * S)

# Two-sided p-value for H0: alpha = alpha0
alpha0 = 2.0
T = 2 * alpha0 * S
p_two_sided = 2 * min(chi2.cdf(T, df=2 * n), 1 - chi2.cdf(T, df=2 * n))

{
    "alpha_hat": alpha_hat,
    "ci_95": (alpha_low, alpha_high),
    "test_H0_alpha=2.0_p": p_two_sided,
}
{'alpha_hat': 2.1500942564639236,
 'ci_95': (2.003664756067939, 2.301614469333093),
 'test_H0_alpha=2.0_p': 0.04437503708196632}
# Bayesian update for alpha with a Gamma prior (shape-rate)
a0, b0 = 2.0, 1.0  # weak-ish prior

a_post = a0 + n
b_post = b0 + S

prior = stats.gamma(a=a0, scale=1 / b0)
post = stats.gamma(a=a_post, scale=1 / b_post)

grid = np.linspace(0.2, np.quantile(post.rvs(size=50_000, random_state=rng), 0.995), 600)

fig = go.Figure()
fig.add_trace(go.Scatter(x=grid, y=prior.pdf(grid), mode="lines", name=f"Prior Gamma({a0:.0f},{b0:.0f})"))
fig.add_trace(go.Scatter(x=grid, y=post.pdf(grid), mode="lines", name=f"Posterior Gamma({a_post:.0f},{b_post:.1f})", line=dict(width=3)))

fig.update_layout(
    title="Posterior over alpha (Pareto tail index)",
    xaxis_title="alpha",
    yaxis_title="density",
    width=950,
    height=420,
)
fig
# Generative modeling: calibrate alpha to match a target top-share (e.g. 80/20)
target_q = 0.2
target_share = 0.8

# Solve target_share = q^((alpha-1)/alpha)
# Let r = (alpha-1)/alpha = 1 - 1/alpha.
r = np.log(target_share) / np.log(target_q)
alpha_8020 = 1 / (1 - r)

alpha_8020
1.160964047443681
# Simulate and compare empirical top-share and Lorenz curve
alpha, xm = float(alpha_8020), 1.0
n = 200_000
w = pareto_rvs(alpha, xm, size=n, rng=rng)

# Empirical top-q share
q = 0.2
cutoff = np.quantile(w, 1 - q)
top_share_emp = w[w >= cutoff].sum() / w.sum()

# Theoretical top share
top_share_theory = q ** ((alpha - 1) / alpha)

top_share_emp, top_share_theory
(0.776595898635988, 0.8)
# Lorenz curve: empirical vs theoretical
w_sorted = np.sort(w)
cum = np.cumsum(w_sorted)
cum = np.insert(cum, 0, 0.0)
cum = cum / cum[-1]

p = np.linspace(0, 1, len(cum))
lorenz_theory = 1 - (1 - p) ** ((alpha - 1) / alpha)

fig = go.Figure()
fig.add_trace(go.Scatter(x=p, y=cum, mode="lines", name="empirical"))
fig.add_trace(go.Scatter(x=p, y=lorenz_theory, mode="lines", name="theoretical", line=dict(dash="dash")))
fig.add_trace(go.Scatter(x=[0, 1], y=[0, 1], mode="lines", name="equality", line=dict(color="black", dash="dot")))

fig.update_layout(
    title=f"Lorenz curve for Pareto(alpha={alpha:.3f})",
    xaxis_title="population share p",
    yaxis_title="wealth share L(p)",
    width=950,
    height=450,
)
fig

11) Pitfalls#

  • Invalid parameters: \(\alpha\le 0\) or \(x_m\le 0\) is not a valid Pareto distribution.

  • Infinite moments are not a corner case:

    • if \(\alpha\le 1\), the mean is infinite

    • if \(1<\alpha\le 2\), the mean exists but the variance is infinite Many “moment-based” summaries (sample mean/variance) can be unstable.

  • Threshold choice (\(x_m\)) in real data: Pareto is usually a tail model. Choosing the cutoff is a modeling decision; fitting a Pareto to the entire distribution can be misleading.

  • Fitting with loc free in SciPy: letting loc float can absorb part of the minimum and change the interpretation. If you want the textbook model, use floc=0.

  • Numerical overflow in simulation/plots: for small \(\alpha\), extremely large samples occur. Use log-scales and consider clipping for visualization (never for inference).

12) Summary#

  • pareto is a continuous distribution on \([x_m,\infty)\) with power-law tail \(\Pr(X>x)=(x_m/x)^{\alpha}\).

  • The tail index \(\alpha\) controls both extremes and moment existence (mean requires \(\alpha>1\), variance requires \(\alpha>2\), …).

  • Sampling is simple via inverse CDF: \(X=x_m(1-U)^{-1/\alpha}\).

  • With known \(x_m\), inference for \(\alpha\) becomes inference for an exponential rate on \(\log(X/x_m)\).

  • SciPy provides a solid reference implementation: scipy.stats.pareto.

References

  • SciPy documentation: scipy.stats.pareto

  • Embrechts, Klüppelberg, Mikosch (1997). Modelling Extremal Events.

  • Newman (2005). Power laws, Pareto distributions and Zipf’s law.